9 Software Design Issues _f_o_r _t_h_e 9 _P_S-_1_8_6 _A_d_v_a_n_c_e_d _P_a_c_k_e_t _N_e_t_w_o_r_k _C_o_n_t_r_o_l_l_e_r Brian Kantor, WB6CYT _A_c_a_d_e_m_i_c _N_e_t_w_o_r_k _O_p_e_r_a_t_i_o_n_s _G_r_o_u_p _O_f_f_i_c_e _o_f _A_c_a_d_e_m_i_c _C_o_m_p_u_t_i_n_g _U_n_i_v_e_r_s_i_t_y _o_f _C_a_l_i_f_o_r_n_i_a, _S_a_n _D_i_e_g_o 9 December 6, 1988 - 2 - Abstract _A _f_a_s_t _n_e_t_w_o_r_k _f_o_r _a_m_a_t_e_u_r _r_a_d_i_o _r_e_q_u_i_r_e_s _s_o_p_h_i_s_t_i_c_a_t_e_d _n_o_d_e _c_o_n_t_r_o_l_l_e_r_s _t_o _w_o_r_k _w_e_l_l. _K_e_y _t_o _t_h_e _p_e_r_f_o_r_m_a_n_c_e _o_f _a_d_v_a_n_c_e_d _n_o_d_e _c_o_n_t_r_o_l_l_e_r _h_a_r_d_w_a_r_e _i_s _t_h_e _d_e_s_i_g_n _o_f _t_h_e _o_n-_b_o_a_r_d _s_o_f_t_w_a_r_e. _I_s_s_u_e_s _o_f _h_i_g_h_l_y- _e_f_f_i_c_i_e_n_t _d_e_v_i_c_e _d_r_i_v_e_r_s, _p_r_o_t_o_c_o_l _e_n_c_a_p_- _s_u_l_a_t_i_o_n, _a_n_d _p_r_o_c_e_s_s _m_a_n_a_g_e_m_e_n_t _m_u_s_t _b_e _a_d_d_r_e_s_s_e_d _t_o _e_n_s_u_r_e _a_c_c_e_p_t_a_b_l_e _p_e_r_f_o_r_m_a_n_c_e _w_i_t_h _l_i_m_i_t_e_d _m_e_m_o_r_y _a_n_d _a_f_f_o_r_d_a_b_l_e _h_a_r_d_w_a_r_e. _P_S-_1_8_6 _h_a_r_d_w_a_r_e _d_e_s_i_g_n _i_s_s_u_e_s _a_r_e _d_i_s_c_u_s_s_e_d _i_n _a _c_o_m_p_a_n_i_o_n _p_a_p_e_r _b_y _M_i_c_h_a_e_l _B_r_o_c_k, _F_r_a_n_k_l_i_n _A_n_t_o_n_i_o, _a_n_d _T_o_m _L_a_F_l_e_u_r. _1. _I_n_t_r_o_d_u_c_t_i_o_n A high-throughput data network must consist of both high speed links and fast network node controllers. To achieve the high throughput in the controller requires both good hardware and efficient software. The PS-186 offers a highly efficient hardware design including very high speed input/output and a fast processor. The PS-186's high-speed DMA channels allow much of the I/O to proceed in parallel with computation, thus overlapping I/O and processing that in a less sophisticated system might need to proceed serially. However, even the most advanced hardware can be crippled by inefficient software that wastes CPU and I/O resources rather than applying them to useful pro- cessing. To take full advantage of the PS-186 architecture, we have chosen to use a multi-tasking system that can support several programs running at once. By dividing up tasks into those that are time-critical and those that are not, we can set up the critical tasks in the sys- tem such that they will receive the required CPU attention. Less critical tasks will proceed as time permits. 9_________________________ Author's current address: _e_l_e_c_t_r_o_n_i_c: brian@sdcsvax.ucsd.edu _p_a_p_e_r: Office of Academic Computing B-028, La Jolla, CA 92093 USA December 6, 1988 - 3 - The PS-186 multi-tasking operating system is based in general terms upon the UNIX8r9 and other similar simple operating systems. In particular, many of the ideas and practices used have been taken from those presented in the MINIX [10] and XINU [2,3] model operating systems. _2. _S_o_f_t_w_a_r_e _O_r_g_a_n_i_z_a_t_i_o_n The PS-186 operating software can be divided into two categories. One of these is the central or core part of the system, referred to as the _k_e_r_n_e_l. It is responsi- ble for all supervisory low-level I/O functions, process and memory management, interrupt handling, and initialization, and time-critical tasks. The remaining software is termed the _u_s_e_r level software, although there are, strictly speaking, no users on this system. The true distinction is that while there may be many ``user'' processes running at one time, there is only one kernel. User processes can be stopped while they are waiting for input, or while the kernel is handling some other event such as an arriving packet. They are typically used for purposes that are not time- critical and that can operate indepen- dently of particular hardware status. A few examples of tasks that might better be placed in user processes are table look- ups, help menu displays, and the like. These are things that can proceed in parallel with other similar tasks. The kernel, on the other hand, is strictly _s_i_n_g_l_e-_t_h_r_e_a_d_e_d - it can only execute by itself, and is intended for those tasks that need to have exclusive access to the processor or devices, such as interrupt handlers and device drivers. User processes do NOT directly access devices, nor do they handle interrupts. All user processes communicate with the kernel by means of _s_y_s_t_e_m _c_a_l_l_s that cause the kernel to perform some task on behalf _________________________ UNIX is a registered trademark of AT&T Bell Labora- tories 9 December 6, 1988 - 4 - of the user process. A common example is a read or a write - data transfer between a user process and a device. The kernel is responsible for per- forming all encapsulating protocols below some arbitrary level, which we have chosen to be at the ``data stream'' level. That means that when an AX.25 connection is made to the PS-186, the kernel software is responsible for the acceptance, ack- nowledgement, and eventual knockdown of the connection. The kernel will extract the data field from the incoming AX.25 packet and make it available to a user process executing an appropriate _r_e_a_d. Likewise, a user process that wishes to send data over an open AX.25 connection will give the data to the kernel by means of a _w_r_i_t_e system call, and the kernel will do the encapsulation necessary to send the data via AX.25, and then queue it for transmittal by the appropriate device. When there are multiple protocols involved, such as TCP-IP on AX.25, the kernel does the multiple extractions and encapsulations as required, so that the user process again works only at the data level. The decision on whether to place a particular task in the kernel or leave it to user-level processing is based on a number of criteria, some of them empiri- cal. In general, any task which can wait to complete without impacting the perfor- mance of other tasks can generally be placed in a user-level process, whereas tasks that have a number of process depen- dencies pretty much have to go inside the kernel. Additionally, any protocol encap- sulation or unwrapping that does not gen- erate additional packets can be placed in the kernel, thereby making that function available to all user-level processes by means of a uniform system call. _3. _I_n_p_u_t-_O_u_t_p_u_t Each device in the PS-186 has a module of code associated with it in the kernel that does the low-level input- output interface to the actual hardware. 9 December 6, 1988 - 5 - This module is often referred to in the literature as a _d_e_v_i_c_e _d_r_i_v_e_r. At the low level hardware interface, a device driver is responsible for taking data to be output from some generalized system data structure, and actually out- putting it through the corresponding piece of hardware. It also must accept incoming data from the hardware device and place it into a system data structure for further processing. It is common for device drivers to operate in an _i_n_t_e_r_r_u_p_t-_d_r_i_v_e_n mode, with their actions being invoked in response to ``completion'' or ``ready'' signals from the hardware. We have chosen this method over a perhaps simpler scheme where the main software loop simply repetitively checks for device availabil- ity, because the latter scheme potentially wastes a tremendous portion of the avail- able processing power. Additionally, the various drivers make use of the PS-186's DMA (Direct Memory Access) capability to move the actual data between memory and the device without the need of the CPU to read and write every byte. There also must be a simple and con- sistent higher-level interface to the sys- tem data structures that the low-level device drivers access. We have chosen to implement this interface as _r_e_a_d and _w_r_i_t_e system calls that invoke high-level por- tions of the device drivers. Additionally, there are both high- and low-level confi- guration, status, and initialization func- tions that are logically part of the dev- ice driver. Thus each driver can be divided into two logical functions, referred to as the ``top'' and ``bottom'' of the driver. The ``bottom'' function is the inter- face to the hardware; it is invoked in response to an interrupt from the actual device. Typically its sole function is to move data to and from the device and an associated memory buffer or buffer queue. The ``top'' function is the interface to the kernel _r_e_a_d and _w_r_i_t_e system calls. It does the opposite of its corresponding ``bottom'' half; where the bottom half 9 December 6, 1988 - 6 - places incoming hardware data into a buffer, the top half will remove the data from the buffer and give it to the process executing the _r_e_a_d kernel call. When the PS-186 kernel has data to be sent out of a serial port (in response to a _w_r_i_t_e system call), the device driver is called to accept the outgoing data. The driver adds the data to the tail end of a queue of data waiting to be sent, sets a flag indicating that there is indeed data to be sent, and returns. Later, when the output device finishes with the data it was sending, it will cause an interrupt to occur, and the ``bottom half'' of the appropriate device driver will take the next chunk of data from the queue and send it to the device. Thus the kernel (and therefore user processes) need not wait for I/O on a device to complete before resuming proceeding. Data buffers and queues are dynami- cally allocated; when data is received or generated a ``buffer'' (a block of memory) is allocated from the pool of available memory to hold it, and a ``pointer'' that contains the memory address of the block is set up. To save the time that would be wasted in copying from one block of memory to another, data is passed from module to module by passing the pointer to the memory buffer in which the data resides, rather than copying the data itself. When the data is finally consumed, either by being output by a device, or copied into a user-level (outside the kernel) buffer by a _r_e_a_d system call, the memory space used is returned to the available memory pool, and the buffer pointer is dereferenced. When a call to the kernel with data to be output (a _w_r_i_t_e call) would require allocation of more memory buffer space than is allowed, the process making the call is stopped by the simply not return- ing from the system call to that process until there is space and the write can be completed. Since ``blocking'' the process in this manner does NOT stop interrupt service nor other kernel functions, the device will eventually output enough data to free up sufficient memory for the write 9 December 6, 1988 - 7 - to complete and for the user process to resume. On input, if a chunk of data arrives and there is no memory available to hold it, the only practical procedure is to simply discard the data. We anticipate having a large amount of memory available for data buffers, as well as expecting good throughput, so we do not anticipate that it will necessary to discard data often. As a practical note, we have decided to provide each input device with its own memory buffer limit so that no one device could hog all available memory and shut out input from other devices even in the most pathalogical of cases. The over- riding assumption is that higher-level protocols will handle packets lost due to memory congestion in much the same way that packets lost due to collisions or channel congestion are handled. A kernel _r_e_a_d call will return data from the input queue to the user process; if there is no data in the queue, the user process may elect to wait until there is (``read-wait'') or just return (``read- no-wait''). When data is available, it is copied into a buffer space provided by the user process (typically a character array), and the memory buffer space is released to be reused on subsequent input events. One can view the input-output streams as a series of filtered interfaces to the raw packets that are being received or sent. Thus it is possible to open a con- nection that consists of raw AX.25 frames, an AX.25 connected mode stream, IP packets in SLIP, IP packets in AX.25, TCP in IP in AX.25, etc. This is controlled by the parameters passed to the kernel in the _o_p_e_n system call. _4. _D_e_v_i_c_e_s The PS-186 devices that are most interesting are the several serial con- troller chips that form the communications interfaces. (There is an SCSI controller option for general device access, such as to a disk or floppy controller, but we 9 December 6, 1988 - 8 - will not discuss that here.) The serial controller chosen was the Zilog 8530 SCC; the hardware design considerations that lead to it being chosen are discussed in the companion paper on the hardware design of the PS-186. The 8530 SCC can do both asynchronous serial I/O (as perhaps to a terminal or printer), and HDLC synchronous, such as is used in the AX.25 protocol. Any of the PS-186's serial ports can be configured to operate in either of these modes. We therefore have a more complex device driver than if the PS-186 had fixed serial port allocations, since the device driver must be able to handle both sync- and async-configured devices based on parame- ters stored in a table. The driver is also responsible for setting up the modes of the serial ports in the first place. _5. _P_r_o_t_o_c_o_l _H_a_n_d_l_i_n_g Fundamental to the operation of com- munications protocols is the concept of _l_a_y_e_r_i_n_g or _e_n_c_a_p_s_u_l_a_t_i_o_n, whereby data is successively encapsulated or ``wrapped'' in layers of protocol as it is prepared for transmission, and ``unwrapped'' at its destination. The basic concept used is known as a _s_w_i_t_c_h. As a railroad switch controls the path of a train, the switch controls the path that data takes through the various levels of encapsulation and unwrapping. A _p_r_o_t_o_c_o_l _s_w_i_t_c_h makes the data passing decision based on a field contained in each protocol's header that indicates what kind of protocol may be further encapsu- lated within the data field of the current packet. The protocol handling scheme that we chose to use in the PS-186 is located in the kernel software. By keeping all proto- col wrapping and unwrapping inside the main single-thread portion of the operat- ing system and thus making them available to all processes on the system, we sim- plify greatly the amount of protocol han- dling required in the various other processes. 9 December 6, 1988 - 9 - Each PS-186 communications interface is configured at system startup time to handle one type of outermost protocol (for example, KISS AX.25 is appropriate for a serial interface to a radio link, or perhaps SLIP for a hardwire line.) As a packet is received from an interface, it is examined according to the rules that have been set up for that interface. When that packet has been received and the appropriate acknowledgements generated, the contents of the packet and selected fields extracted from its header are passed to the appropriate protocol switch. The protocol switch then examines the packet contents and routes the data further to the next protocol module, as appropriate. This process repeats until there is no further enclosed protocol, until the data has been fully extracted and is available in a buffer queue to be used by some user process. Not until the data has been fully extracted does it become ready to leave the kernel environ- ment. A concrete example may make this clearer: Suppose that we receive an AX.25 packet on a serial link that is attached to a radio. If it is for us (as shown by the destination callsign) we will ack- nowledge that AX.25 packet (if appropri- ate), and if it was a data packet (UI or I frame), we will pass it to the AX.25 pro- tocol switch. That switch will examine the Protocol ID byte that is part of the AX.25 packet. If the PID is for a stream connection (a normal mode AX.25 connection such as is commonly in use today for keyboard-to-keyboard typing), then the packet will be further switched by the AX.25 Stream switch, which will send it (based on the callsign in the source field of the AX.25 header, since there can be only one connection per source callsign) to the user process that is servicing that stream connection. If, instead, the PID is for ARP, RARP, RIP, or one of the other raw packet protocols, the data will be sent to the user process that handles that kind of packet - to build address or routing tables, for example. 9 December 6, 1988 - 10 - A packet with the PID indicating an encapsulated IP packet is passed to the module that does IP protocol - checksums and other integrity checks. If the packet is ok by IP standards, the IP module will call the IP protocol switch, which will examine the Protocol ID byte in the IP packet (distinct from the PID in the AX.25 packet). This will in turn route the IP packet to another protocol handler, such as UDP, ICMP, RDP, or TCP - whichever we have implemented. Again, those are expected to route the data based on fields in the headers of these protocols. TCP is an interesting example of imbedded protocols and switching. Each TCP connection as seen on a host is designated uniquely by a 64-bit number that is the concatenation of the distant host's Inter- net address (32 bits) and the local and distant TCP logical port numbers (16 bits each). Since when a TCP connection is initiated, the originating host must chose a new (not currently nor recently used) logical source port number, there can be multiple logical connections between TCPs on the same two hosts even to the same distant port. The data switch in the receiving TCP is required to separate out the streams of data based upon the 64-bit stream identifier, and deliver each to potentially separate user processes as appropriate. _6. _P_r_o_c_e_s_s _C_o_n_t_r_o_l The PS-186 is organized as a multi- tasking system, implying that more than one process may be running at a time. There is always a ``null'' process that is constantly ready to run; when there is no other process ready to run, the null pro- cess is active. Processes are created as needed and destroyed when no longer needed. In this manner, resources are not consumed on idling processes that are merely sitting around waiting in case they are needed. For example, when an AX.25 connection is made to the PS-186 network node, a process is started to handle incoming stream data. This process will exit and its resources 9 December 6, 1988 - 11 - will be deallocated when the connection is closed. Each such connection will cause a separate process to be spawned. User-level processes do not perform I/O operations to devices; they instead make _s_y_s_t_e_m _c_a_l_l_s to the kernel that invoke the required I/O. When a process makes a system call that would require some time to complete (such as I/O), that process is _b_l_o_c_k_e_d - that is, placed in suspended state, and another process is resumed. Periodically (in response to interrupts from the system real-time clock), the current process will be suspended and another selected to run. Thus no process can hog CPU resources, and I/O can proceed in parallel with ordinary processing. We feel that multitasking is a supe- rior method in this application, although it is much more complex than a single- threaded program, because much of what goes on in a device like the PS-186 is not time-critical, and we can therefore devote the CPU to high-priority events (such as the arrival and buffering of a packet) that are truly critical. _7. _C_o_n_c_l_u_s_i_o_n We feel that the PS-186 Advanced Packet Controller represents a significant step towards the construction of an effi- cient and practical amateur radio data network. By combining fast hardware and efficient software into a flexible package that accomodate today's and tomorrow's protocols, we believe we have advanced the network one step further along the road to completion. _8. _R_e_f_e_r_e_n_c_e_s [1] AT&T, ``Communications Protocol Specification BX.25'', Publication 54001 Issue 2 (June 1980) [2] Comer, D., _O_p_e_r_a_t_i_n_g _S_y_s_t_e_m _D_e_s_i_g_n - _t_h_e _X_I_N_U _A_p_p_r_o_a_c_h, Prentice-Hall (1984) 9 December 6, 1988 - 12 - [3] Comer, D., _O_p_e_r_a_t_i_n_g _S_y_s_t_e_m _D_e_s_i_g_n - _I_n_t_e_r_n_e_t_w_o_r_k_i_n_g _w_i_t_h _X_I_N_U, Prentice- Hall (1987) [4] DEC/Intel/Xerox, ``The Ethernet - A Local Area Network Data Link Layer and Physical Layer Specification.'', Version 1.0 (Sep 30 1980) [5] Fox, T. L., ``AX.25 Amateur Packet- Radio Link-Layer Protocol'', Version 2.0, ARRL (Oct 1984) [6] Griffiths, Georgia, and G. Carlyle Stones, ``The Tea-Leaf Reader Algo- rithm: An Efficient Implementation of CRC-16 and CRC-32'', Communications of the ACM, 30,7 (July 1987) [7] IEEE, _L_o_g_i_c_a_l _L_i_n_k _C_o_n_t_r_o_l, ANSI/IEEE Std 802.2-1985 (1984) [8] Postel, J. et al., ``DDN Protocol Handbook'', USC-ISI (1986) [9] Tanenbaum, A., _C_o_m_p_u_t_e_r _N_e_t_w_o_r_k_s, Prentice-Hall (1981) [10] Tanenbaum, A., _O_p_e_r_a_t_i_n_g _S_y_s_t_e_m_s - _D_e_s_i_g_n _a_n_d _I_m_p_l_e_m_e_n_t_a_t_i_o_n, Prentice- Hall (1987) 9 December 6, 1988